Skip to main content

Preprocessor Options

Welcome to the jungle, also known as the ControlNet Preprocessor node. This is where your image gets poked, prodded, outlined, depth-mapped, or otherwise tortured into a usable conditioning map for ControlNet. The preprocessor field is arguably the most important setting in this node—because this is where you tell ComfyUI what kind of preprocessing to apply to the image.

Each preprocessor option here corresponds to a different algorithm or pipeline used to extract structural, semantic, or stylistic features from an image. These features are then fed into a ControlNet model to guide image generation with a specific type of constraint (e.g., edges, poses, segmentation maps, depth, etc).


🛠️ Setting: preprocessor

🔧 Requirements

  • Input: A valid image (some require RGB, some prefer grayscale).
  • Dependencies: Many of these preprocessors rely on external Python libraries like OpenCV, PyTorch, Detectron2, etc. If you get errors, you’re probably missing one.
  • Resolution: Works best with input resolution in the 512–1024px range. Too small and it gets dumb; too big and it might just explode your VRAM.

📚 Available Preprocessor Options

Let’s go way too deep into each one:


none

  • What It Does: Skips preprocessing. Sends the raw image straight to ControlNet.
  • Use Case: If you’ve already prepared your conditioning image manually or externally.
  • Strengths: No overhead, total control.
  • Weaknesses: No built-in structure guidance.
  • Tip: Use with checkpoints designed for training with raw guidance (e.g. mask-guided or image-to-image workflows).

canny

  • What It Does: Applies the Canny edge detection algorithm.
  • Needs: A clean image; edge contrast is key.
  • Strengths: Fast, clean, and sharp outlines.
  • Weaknesses: Overly simplistic for complex forms; loses context.
  • Ideal For: Architectural designs, lineart sketches, hard-edged compositions.

canny_pyra

  • What It Does: Pyramid-based Canny edge detection—uses multiscale processing for more edge detail.
  • Strengths: Better at picking up fine and coarse edges.
  • Weaknesses: Slightly noisier and slower than basic canny.
  • Ideal For: Photographic textures, layered compositions.

lineart, lineart_anime, lineart_manga, lineart_any

All use a neural net trained to extract linework, but with specific style biases.

  • lineart: Generic lineart extractor.
    • Great for: Stylized outlines, comic book art.
  • lineart_anime: Biased for smooth, cell-shaded anime contours.
    • Great for: 2D anime characters.
  • lineart_manga: Biased toward thick-thin black & white linework typical of manga panels.
    • Great for: Black & white manga workflows.
  • lineart_any: Trained for multiple styles, most versatile.
    • Great for: Mixed media projects.

Common Strengths: Stylized structure for cartoon and inked looks.
Common Weaknesses: May hallucinate or drop edges on photo inputs.


scribble, scribble_xdog, scribble_pidi, scribble_hed

All reduce the image to simplified strokes—good for abstraction and creativity.

  • scribble: Raw sketch-like edge map.
  • scribble_xdog: Uses Extended Difference of Gaussians for softer, dreamlike edges.
  • scribble_pidi: PIDI net used for refined sketch maps.
  • scribble_hed: HED-based scribble extraction.

Strengths: Great for creative workflows, AI doodles, or abstract image-to-image tasks.
Weaknesses: Not ideal for realism or detail preservation.


hed

  • What It Does: Holistically-nested Edge Detection.
  • Strengths: Clean contours, preserves semantic shapes.
  • Weaknesses: May blur tight edge detail.
  • Use Case: Ideal for sketches, pose interpretation, soft outlines.

pidi

  • What It Does: Uses PIDINet for refined semantic edge detection.
  • Strengths: Sharp, content-aware edges.
  • Weaknesses: More VRAM usage.
  • Use Case: Balanced stylized realism.

mlsd

  • What It Does: Extracts straight lines from images (like MLSD paper).
  • Strengths: Architectural precision.
  • Weaknesses: Useless on organic shapes.
  • Use Case: Buildings, interiors, mechanical schematics.

pose, openpose, dwpose, pose_dense, pose_animal

  • pose / openpose: Human pose detection (skeleton keypoints).
  • dwpose: Deep Whole-body Pose Estimation, better foot/hand coverage.
  • pose_dense: Adds facial landmarks and dense joints.
  • pose_animal: Pose estimation for animals.

Strengths: Gives precise figure structure.
Weaknesses: Can miss limbs in weird angles or crowded scenes.
Use Case: Character design, pose reference, animation base.


normalmap_bae, normalmap_dsine, normalmap_midas

  • What It Does: Converts RGB image to a normal map (pseudo-3D surface info).
  • Strengths: Good for 3D-aware effects, lighting guidance.
  • Weaknesses: Doesn't capture real geometry, only inferred.
  • Differences:
    • bae: Balance of detail and softness.
    • dsine: May emphasize curvature.
    • midas: Mid-level depth approximation.

depth, depth_anything, depth_anything_v2, depth_anything_zoe, depth_zoe, depth_midas, depth_leres, depth_metric3d, depth_meshgraphormer

These are your depth prediction models.

  • depth_anything / v2: Based on Depth Anything models.
  • depth_zoe / anything_zoe: Use ZoeDepth for sharper predictions.
  • depth_midas: Good general-purpose.
  • depth_leres: LeReS network—very accurate, but slower.
  • depth_metric3d: Metric depth prediction.
  • depth_meshgraphormer: Mesh reconstruction from images.

Strengths: Amazing for 3D-aware composition, lighting.
Weaknesses: Long processing time, can create depth artifacts.
Use Case: Landscapes, portraits with background variation.


seg_ofcoco, seg_ofade20k, seg_ufade20k, seg_animeface

Semantic segmentation:

  • seg_ofcoco: COCO object segmentation (general things: people, cars, etc.)
  • seg_ofade20k: Scene/semantic parsing of environments.
  • seg_ufade20k: Upscaled version of ADE20k.
  • seg_animeface: Segment anime faces into parts.

Strengths: Structural conditioning by regions.
Weaknesses: Overlaps/ambiguities in masks.
Use Case: Face swapping, region-specific generation.


shuffle

  • What It Does: Scrambles image tiles to create chaotic conditioning.
  • Strengths: Adds randomness and variation.
  • Weaknesses: Not deterministic.
  • Use Case: Style transfer, experimentation, glitch art.

teed

  • What It Does: Transformer-based edge detection.
  • Strengths: Combines edge, semantic, and texture cues.
  • Weaknesses: Slow and heavy on memory.
  • Use Case: High-end stylized or structure-aware compositions.

color

  • What It Does: Extracts dominant color regions.
  • Strengths: Great for color-guided generation.
  • Weaknesses: No structure, no lines.
  • Use Case: Style transfer, palette preservation.

sam

  • What It Does: Uses Meta’s Segment Anything Model (SAM) to create mask regions.
  • Strengths: Ultra-precise, works on nearly any object.
  • Weaknesses: May require manual refinement.
  • Use Case: Compositional control, background editing, multi-subject control workflows.

🧪 Prompting Tips

  • Pair your preprocessor with ControlNet models designed to accept its output (e.g., use canny with a Canny-trained ControlNet model).
  • For style workflows (anime, manga), combine a stylized preprocessor with a matching LoRA and prompt style.
  • Want consistency? Use the same preprocessor + seed + conditioning image across multiple runs.

🚫 What-Not-To-Do-Unless-You-Want-a-Fire

Oh, so you like chaos? You enjoy watching your GPU cry? Great, then here's what not to do with the preprocessor setting unless you're actively trying to summon the AI demons of instability:

❌ Use the wrong preprocessor with the wrong ControlNet model

You wouldn't feed a cat spaghetti and expect it to do math. Likewise, don't feed pose_animal output into a ControlNet trained for depth_midas. The result? Nonsense conditioning, wasted steps, and outputs that look like AI had an existential crisis.

Fix: Always match your preprocessor with its sibling ControlNet (e.g., hed → HED model, depth_anything → ControlNet trained on Depth Anything).

❌ Forget to install dependencies

Half of these preprocessors are built on third-party magic. Missing detectron2, segment-anything, openpose, or opencv? You’ll get red errors, blank images, or worse: success that isn’t actually success.

Fix: Check your install. Use a requirements.txt file. Don’t YOLO this.

❌ Run high-res images through depth_leres or sam on 8GB VRAM

If you're running a potato laptop with a fancy GPU sticker but no actual power, please don’t crank depth_leres or sam to 2048x2048. These models will eat your VRAM and then casually torch your runtime with an out-of-memory error.

Fix: Stay under 1024x1024 unless you’re packing real heat.

❌ Expect perfect outlines from scribble_xdog on low-contrast images

Low contrast images + xdog = muddy soup. It’s not a “dreamlike sketch,” it’s a failed art student’s nightmare.

Fix: Boost your image contrast before applying xdog.

❌ Use shuffle and expect consistency

Shuffle does what it says—it shuffles. It’s not a structured preprocessor, it’s an agent of chaos.

Fix: Don’t use it unless you want variety over control. Never in production workflows. Ever.

❌ Assume pose_dense will get every joint right

If your character is lying down, twisted, or facing away from the camera, pose_dense might just give up entirely. Expect floating limbs and mysterious spaghetti arms.

Fix: Stick with standard pose or dwpose for more stable results. Always validate visually.

❌ Mix multiple preprocessors on the same conditioning channel

Unless your ControlNet expects a specific composite input (and you really know what you're doing), mixing outputs like depth + canny into the same ControlNet model is like throwing oil and water into a blender—loud, messy, and completely ineffective.

Fix: One preprocessor, one ControlNet, per channel. Keep your chaos modular.

❌ Skip normalization when using normalmap_*

Feeding an unnormalized or overly bright image into a normalmap extractor? Get ready for washed-out normals or weird lighting shadows.

Fix: Preprocess with tone mapping or exposure correction first.

❌ Rely on seg_* for precision mask work

Semantic segmentation ≠ accurate masking. These models often blur edges or clip object boundaries. Don’t use them if you're trying to do surgical precision work like inpainting hair strands.

Fix: Use sam instead. It’s designed for precision.

❌ Forget that more preprocessing ≠ better results

Yes, we know—it’s tempting to run every image through five preprocessors, load five ControlNets, and see what happens. But you’ll probably just get noise, hallucinations, or broken anatomy.

Fix: Be deliberate. Preprocessors are tools, not spice blends. Pick the one that suits your task, and leave the rest out of your stew.

And finally:

🔥 Don’t forget to laugh when it breaks

This is ComfyUI. If something goes wrong and you get AI soup or a melted mannequin, remember: it’s not a bug, it’s a rite of passage.